1. Exploratory analysis


1.1 Data Inputs

NOAA SEFSC visual data goes back to 1992, but as shown in the figure below, many predictor variables are only available starting in 2003, therefore earlier visual data is currently excluded from further analyses.

Note: Future work could use monthly climatologies (averages) so that older sightings data could be used. Some dynamic drivers like eddy and front locations would not be able to be considered using that approach.

Visual data predictor variable availability:



1.1.1 Testing and Training Sets

The data are split into training and testing sets. In this case, data from 2009 and 2013 are used for testing. Only observations post-2003 are used for modeling due to covariate limitations.


1.1.2 Preliminary Mapping

The visual data selected for modeling are displayed on the map below. Data from 2009 were held back for testing. Blue markers indicate HARP locations.


The time series below show the acoustic data used for modeling. Data from 2011 and 2012 used for training, and 2013 data is held back for testing. These density magnitudes are very preliminary,and models only used presence absence data.


Acoustic Timeseries:


1.2 Examine Covariates

Identify Outliers

Replace extreme outliers (bad data) with NaNs.


1.2.1 Check Covariate Distributions

Covariates have different distributions across the observations.

Distributions of covariates from acoustic observations:


Distributions of covariates from the visual observations:


Some of these covariates are more or less interrelated. HYCOM estimates of salinity at the surface (HYCOM_SALIN_0) and at 100m (HYCOM_SALIN_100) are very similar, so I selected surface salinity for simplicity. HYCOM current and upwelling estimates at 100m were also removed (HYCOM_MAG_100,HYCOM_UPVEL_100). HYCOM current direction is site specific in the acoustic data, so it was excluded for now.

Remaining correlations are examined in the figure below. Numbers closer to 1 above the diagonal in the figure below represent correlation coefficients. Highly correlated covariates should not be used together in the same model. Day of year was excluded to avoid artifacts associated with the temporal differences of the datasets (Acoustic data are year-round, and visual data are collected in spring/summer).


Covariate Correlations:


1.2.2 Transform Predictor Variables

Some variables, including chlorophyll, current magnitude, mixed layer depth and distance to fronts are highly skewed and were log-transformed.



Below, the two sets of covariates have been combined and transformed:


1.2.3 Check Predictors

To get an idea of the basic predictive power of these covariates, we can look at presence/absence relative to each variable. This also provides an opportunity to look at the range of values observed for each covariate in the visual and acoustic datasets. In the plots below dotted lines indicate the distribution of each covariate when animals were present, and solid lines indicate the distribution when animals were absent. Note that these plots do not account for effort.


Acoustic kernel densities:



Visual kernel densities:



1.2.4 Estimate Relative Weights

To train the model, we need to know how much power the various data points have relative to one another. If an animal is sighted or heard, we know for certain that the species was present. However, if it was not heard, then it either wasn’t present, or it was present but missed. For each data type, we estimated the probability of a missed detection to downweight zeros in the presence absence data.

The visual data represent wether or not Kogia spp. were seen during each transect segment. The probability of missing a sighting of risso’s dolphin was estimated as \[P_{V}(detect|present) = \mu_{det} * g0\ = 0.4822517 * 0.42 = 0.2025457\]

where _{det} is the mean detection probabiltiy as estimated by a model fit using the mrds package, and g0 is the probability of observing an animal on the transect line. We assume that reported absences are likely to be true absences X % of the time, therefore zeros are given a wieght of X on a scale of [0,1].

The acoustic data represent presence or absence of Kogia spp. detections in one day bins. Given that a group of dolphins is present near the sensor, the probability of detecting them in a 5 minute period is estimated at 0.65, therefore the probability of missing is 1-0.65 = 0.35. Given that animals were present, the probability of missing a group for a full day is estimated as \[P_{A}(detect|present) = 1-(1-0.65)^{225}) \approx 1\]

Therefore we assume that there are no false negative days in the passive acoustic timeseries, and all observations are given weight = 1.


Best visual detection probability model:


2. Model Fitting

Models were fit using avNNet from the caret package in R.


2.1 Run Models

Run NNs Acoustic only, Visual only, and joint Acoustic/Visual datasets.

Models have the following characteristics:

  • R trainRepeats averaged repeats with random node initalization

  • Include 9 covariates

  • One hidden layer

  • Weighted training data

  • Hidden node layer sizes from 4 to 14 were tested in 2 node increments to search for optimal network size.


## ACOUSTIC ONLY
AcCounter <- 0
f.AcOnly_NN1 <- as.formula(paste("yAcOnly_TF ~", paste(n[model1.indices], collapse = " + ")))
# Iterate over a range of hidden layer sizes between 2 and 14 nodes.

for (layerSize in layerSizeList){
  AcCounter <- AcCounter + 1  
  # put together the formula
  # train network
  nn_AcOnly[[AcCounter]] <- avNNet(f.AcOnly_NN1, data=AcOnly_train_scaled, 
                                 size = layerSize, 
                                 repeats = trainRepeats,
                                 na.action = na.omit,
                                 rang = 0.7, 
                                 decay = 0.0001, 
                                 maxit = 1000,
                                 trace = FALSE)
  
  # predict on train data and estimate Mean Squared Error (MSE)
  pr.nn_AcOnly_train[[AcCounter]] <- predict(nn_AcOnly[[AcCounter]],AcOnly_train_scaled[,model1.indices])
  MSE.nn_AcOnly_train[[AcCounter]] <- sum((AcOnly_train_scaled$yAcOnly_TF - 
                                    pr.nn_AcOnly_train[[AcCounter]])^2)/nrow(AcOnly_train_scaled[model1.indices])
  # predict on test data and estimate MSE
  pr.nn_AcOnly_test[[AcCounter]] <- predict(nn_AcOnly[[AcCounter]],AcOnly_test_scaled[,model1.indices])
  MSE.nn_AcOnly_test[[AcCounter]] <- sum((AcOnly_test_scaled$yAcOnly_TF - 
                                   pr.nn_AcOnly_test[[AcCounter]])^2)/nrow(AcOnly_test_scaled[model1.indices])
  cat(paste("Done with AcOnly model iteration ",AcCounter, " of ", length(layerSizeList),": Layer Size = ", layerSize, "\n"))

}
## VISUAL ONLY
modelCounter <- 0
# put together the formula
f.VisOnly_NN1 <- as.formula(paste("yVisOnly_TF ~", paste(n[model1.indices], collapse = " + ")))
f.Joint_NN1 <- as.formula(paste("y_TF ~", paste(n[model1.indices], collapse = " + ")))
for (layerSize in layerSizeList){
  modelCounter <- modelCounter + 1  
  # train network
  nn_VisOnly[[modelCounter]] <- avNNet(f.VisOnly_NN1, VisOnly_train_scaled,
                                       weights = VisOnly_train_scaled$weightsG0,  
                                       size = layerSize, 
                                       repeats = trainRepeats,
                                       na.action = na.omit,
                                       rang = 0.7,  
                                       decay = 0.0001, 
                                       maxit = 10000,
                                       trace = FALSE)
                                       # weights = 
  
  # predict on train data and estimate MSE
  pr.nn_VisOnly_train[[modelCounter]] <- predict(nn_VisOnly[[modelCounter]],VisOnly_train_scaled[,model1.indices])
  MSE.nn_VisOnly_train[[modelCounter]] <- sum(VisOnly_train_scaled$weightsG0*
                                                (VisOnly_train_scaled$yVisOnly_TF - 
                                                pr.nn_VisOnly_train[[modelCounter]])^2)/
                                                nrow(VisOnly_train_scaled[model1.indices])
  
  # predict on test data and estimate MSE
  pr.nn_VisOnly_test[[modelCounter]] <- predict(nn_VisOnly[[modelCounter]],VisOnly_test_scaled[,model1.indices])
  MSE.nn_VisOnly_test[[modelCounter]] <- sum(VisOnly_test_scaled$weightsG0*
                                               (VisOnly_test_scaled$yVisOnly_TF - 
                                                pr.nn_VisOnly_test[[modelCounter]])^2)/
                                                nrow(VisOnly_test_scaled[model1.indices])
  
  ## JOINT
  nn_Joint[[modelCounter]]  <- avNNet(f.Joint_NN1, Joint_train_scaled,
                                      weights = Joint_train_scaled$weightsG0,
                                      size = layerSize, 
                                      repeats = trainRepeats, 
                                      na.action = na.omit,
                                      rang = 0.7,  
                                      decay = 0.0001, 
                                      maxit = 10000,
                                      trace = FALSE)                                     

  
  pr.nn_Joint_train[[modelCounter]]  <- predict(nn_Joint[[modelCounter]] ,Joint_train_scaled[,model1.indices],na.action=na.omit)
  MSE.nn_Joint_train[[modelCounter]]  <- sum(Joint_train_scaled$weightsG0*(Joint_train_scaled$y_TF - 
                                                pr.nn_Joint_train[[modelCounter]])^2)/
                                                nrow(Joint_train_scaled[model1.indices])
  
  pr.nn_Joint_test[[modelCounter]]  <- predict(nn_Joint[[modelCounter]],Joint_test_scaled[,model1.indices],na.action=na.omit)
  MSE.nn_Joint_test[[modelCounter]]  <- sum(Joint_test_scaled$weightsG0*(Joint_test_scaled$y_TF - 
                                               pr.nn_Joint_test[[modelCounter]])^2)/
                                               nrow(Joint_test_scaled[model1.indices])
  cat(paste("Done with VisOnly and Joint model iteration ",modelCounter, " of ", length(layerSizeList),": Layer Size = ", layerSize, "\n"))
}



2.2 Model Comparisons

Models were compared using a Kolmogorov-Smirnov test to compare predicted and observed presence/absence in the test data.

## [1] "cross entropy scores (lower is better)"
##                       4      6      8     10     12     14
## Acoustic - Train  9.569  8.531  7.452  6.663  5.874  4.691
## Acoustic - Test  14.647 14.251 14.993 14.548 14.597 14.993
## Visual - Train    0.270  0.170  0.170  0.128  0.075  0.142
## Visual - Test     0.231  0.231  0.278  0.278  0.231  0.231
## Joint - Train     5.264  4.825  4.393  4.008  3.781  3.351
## Joint - Test      9.402  9.124  9.323  9.589  9.505  9.529


2.3 Variable Importance

For the best model in each category, the importance of each input variable was calculated across the 50 model iterations.

AcOnly VisOnly Joint
SST 12.8 13.1 11.5
SSH 21.5 9.6 18.3
log10_CHL 20.0 9.5 22.5
log10_HYCOM_MLD 4.4 13.7 4.8
HYCOM_SALIN_0 10.6 8.6 12.4
log10_HYCOM_MAG_0 6.7 11.2 6.9
HYCOM_UPVEL_50 9.4 10.7 8.1
Neg_EddyDist 6.8 14.9 7.6
Pos_EddyDist 7.9 8.6 7.8

Example network:


3. Predict on Test Data

3.1 Temporal prediction

Predictions were made on the acoustic test dataset, and compared with actual observations for 2013. The predicted probabliity of encountering animals was compared with the actual weekly rate of occurrence of animals at a site.

CAVEAT: Encounter probability from the data is estimated as fraction of days per week during which this species was detected.


3.1.1 Acoustic Only Prediction

Predicted and observed encounter probabilities at passive acoustic sites using the acoustic-only model (Site order: MC, GC, DT):


3.1.2 Visual Only Prediction

Predicted and observed encounter probabilities at passive acoustic sites using the visual-only model (Site order: MC, GC, DT, DC, MP):


3.1.3 Joint Prediction

Predicted and observed encounter probabilities at passive acoustic sites using the joint model (Site order: MC, GC, DT, DC, MP):


3.2 Spatial Prediction

Models were evaluated for summer (July 2009) and winter(January 2009) across the entire Gulf of Mexico (US EEZ beyond the 200m contour).


Summer 2009 predicted distribution and test sightings:

## OGR data source with driver: ESRI Shapefile 
## Source: "E:\NASData\GU2009Effort\GU_Effort_Merge_clip_Project.shp", layer: "GU_Effort_Merge_clip_Project"
## with 280 features
## It has 8 fields
## Integer64 fields read as strings:  Id

Summer 2009 prediction uncertainty:


Summer 2009 predicted probability of sighting and test sightings:

Winter 2009 predicted distribution:

Winter 2009 prediction uncertainty:


4. Monthly model predictions

Spatial model predictions were generated using oceanographic variables averaged by month between 2003 and 2015.